Modeling of scour hole characteristics under turbulent wall jets using machine learning

The novelty of the present study is to investigate the parameters that depict the scour hole characteristics caused by turbulent wall jets and develop new mathematical relationships for them. Four significant parameters i.e., depth of scouring, location of scour depth, height of the dune and location of dune crest are identified to represent a complete phenomenon of scour hole formation. From the gamma test, densimetric Froude number, apron length, tailwater level, and median sediment size are found to be the key parameters that affect these four dependent parameters. Utilizing the previous data sets, Multi Regression Analysis (linear and non-linear) has been performed to establish the relationships between the dependent parameters and influencing independent parameters. Further, artificial neural network-particle swarm optimisation (ANN-PSO) and gene expression programming (GEP) based models are developed using the available data. In addition, results obtained from these models are compared with proposed regression equations and the best models are identified employing statistical performance parameters. The performance of the ANN-PSO model (RMSE = 1.512, R2 = 0.605), (RMSE = 6.644, R2 = 0.681), (RMSE = 6.386, R2 = 0.727) and (RMSE = 1.754, R2 = 0.636) for predicting four significant parameters are more satisfactory than that of regression and other soft computing techniques. Overall, by analysing all the statistical parameters, uncertainty analysis and reliability index, ANN-PSO model shows good accuracy and predicts well as compared to other presented models.

and Karbasi and Azamathulla 7 through five soft-computing techniques: artificial neural networks, support vector regression, gene expression programming, grouping method of data handling (GMDH) neural network and adaptive-network-based fuzzy inference system.Considering the issue of scour under wall jets, which reflects a threat to the foundations of hydraulic structures, Aamir and Ahmad 8 proposed an empirical equation for prediction of scour at downstream of a rigid apron under wall jets.Chen et al. 9 conducted comprehensive laboratory experiments and also employed computational fluid dynamics (CFD) modeling to simulate turbulent wall jets induced scour characteristics.Li et al. 10 employed a neural network-based approach to calculate scour depth under wall jet phenomenon, considering flow and sediment parameters as inputs.Scour characteristics at the downstream of stilling basins were studied by Farhoudi et al. 11 by employing a neuro-fuzzy model.Ebtehaj et al. 12 used the self-adaptive extreme learning machine technique to predict the equilibrium scour depth around bridge piers.Scour depth at seawalls were investigated by Pourzangbar et al. 13 through genetic programming (GP) and ANN.The recent studies revealed that the soft-computing models exhibit greater accuracy as compared to empirical equations.Scour depth near spur dikes is studied by Pandey et al. 14 using two innovative tree-based ensemble models such as stacked boosting regression tree (SBRT) and stacked bagging regression tree (SBGT).Further, three robust AI-based techniques were introduced by Pandey et al. 15 , such as gradient boosting decision tree (GBDT), cascaded forward neural network (CFNN), and kernel ridge regression (KRR), to predict the scour depth around a spur dike in cohesive sediment mixtures.
All the experimental studies may involve physical scale models or field measurements to investigate scour phenomena.They help to establish empirical relationships between the influencing parameters and the magnitude of scour depth.The optimal aim of these studies was to estimate the maximum scour depth however, the understanding of the scour profile and its spatial distribution are not investigated in depth.According to author's knowledge, so many works are carried out in the study of scour depth modelling but none of them has focused particularly on modelling of four parameters which can completely define the scour hole development in both vertical and horizontal dimensions.It must be noted that predicting the scour hole profile caused by turbulent wall jets is a complex task and relies on accurate flow conditions and sediment properties.
The prediction of the scour hole characteristics involves the calculation of scour depth, location of scour depth, depth of the dune and location of dune crest, as shown in Fig. 1.Complete information on these four parameters can probably render a detailed knowledge of the specific scouring mechanism and its form.As the recent studies have focused only on the multidisciplinary nature of scour depth prediction through experimental, numerical, and machine learning approaches, models for other three parameters (location of scour depth, height of the dune and location of dune crest) are further required to be developed to showcase the complete phenomenon of scour characteristics under turbulent wall jet conditions.
Aamir and Ahmad 8 have provided a wide range of data sets through extensive experimentation in a rigid apron under wall jets.They proposed empirical equations for scour depth accounting of densimetric Froude number, sediment size/sluice opening ratio and tailwater level/sluice opening ratio as independent parameters.To a greater degree, a sincere effort has been made by Aamir and Ahmad 16 to evaluate the performance of ANN and ANFIS models to estimate maximum scour depth using an extensive experimental data set of attached wall jets.Chatterjee et al. 17 , Aderibigbe and Rajaratnam 18 , Dey and Sarkar 19 and Aamir and Ahmad 8,20 have also developed regression based equations for scour depth prediction.These studies only focussed on predicting scour depth, so a gap has been felt between scour depth and scour hole characteristics.Therefore, an attempt has been made to develop two multi-regression based models (linear and non-linear) and two machine learning technique based models for each scour hole parameters, i.e., depth of scouring, location of scour depth, height of the dune and location of dune crest.It is imperative to understand from Fig. 1 that d s is the maximum equilibrium scour depth (depth of scouring), x s is the distance to asymptotic depth of scour from the end of rigid apron (location of scour depth), d d is the height of dune and x d is the distance to maximum height of dune crest from the end of rigid apron (location of dune crest).Coming to AI based models, artificial neural network-particle swarm optimisation (ANN-PSO) and gene expression programming (GEP) are accounted in the study.
Further research on development of new methods for various parameters is always expected to understand a complete phenomenon.So, model development has been performed through multivariable regression analysis and two machine learning techniques, such as artificial neural network-particle swarm optimisation (ANN-PSO) and gene expression programming (GEP) using available data sets of Aamir and Ahmad 8 .The essential independent parameters considered in this study are densimetric Froude number, apron length, tailwater level, and median sediment size.Further, the comparison has been made in terms of different statistical parameters.
Figure 1 depicts a definition sketch of the scour hole developed under a wall jet.In this figure, d s = maximum equilibrium scour depth, x s = distance to asymptotic depth of scour from the end of rigid apron, x o = longitudinal extension of scour hole, d d = height of dune, x d = distance to maximum height of dune crest from the end of rigid apron, a = sluice opening, V = issuing jet velocity, d t = tailwater depth, L = length of the rigid apron.Maximum scour depth depends on various parameters, viz.sluice opening, sediment size, jet Froude number and tailwater depth.Since the jet is two dimensional, profile of the scour hole under wall jet is also two dimensional across the width of the flume.Therefore, the figure represents any longitudinal section of the profile of the scour hole.
Application of machine learning and soft computing techniques has been of interest to many researchers in prediction of scour depth under various hydraulic structures such as piers, abutments, spur dikes etc., but there has not been substantial work undertaken to develop soft computing techniques for prediction of scour depth and other scour hole characteristics under the influence of wall jets.The novelty of this paper lies in respect of using a wide range of data and predicting various scour characteristics, whereas the existing studies have only focused on the maximum scour depth.The developed soft computing models in the present study using such a wide range of data provide better prediction of scour characteristics for a wider range of parameters, which would facilitate better and more accurate prediction of scour characteristics for the reliable design of hydraulic structures.Additionally, the pivotal factors governing the scour characteristics have been identified using the Gamma test.
In the present study, four significant parameters i.e., depth of scouring, location of scour depth, height of the dune and location of dune crest are identified and also modelled to represent a complete phenomenon of scour hole formation caused by turbulent wall jets.
The present study aims to develop new independent mathematical relationships for these four parameters utilising the previous data sets.Multi Regression Analysis has been performed (linear and non-linear) to establish the relationships between the dependent parameters and influencing independent parameters.Further, artificial neural network-particle swam optimisation (ANN-PSO) and gene expression programming (GEP) based models are developed using the available data to compare the performance of all these models in predicting the four parameters.The intension of the present work is that if the parameters such as depth of scouring, location of scour depth, height of the dune and location of dune crest will be predicted, then one can visualise the total phenomenon.It is the extension of scour depth modelling because scour depth prediction only shows the depth in vertical axis.However, the present study tries to figure out whole phenomenon by predicting the horizontal and vertical dimensions of the whole phenomenon.

Dimensional analysis
Dimensional analysis is used as a classical tool to identify the variables affecting the equilibrium scour depth.The functional form of equilibrium scour depth downstream of a stiff apron (smooth and rough) under submerged wall jets can be written as: where ν = kinematic viscosity of water; σ g = geometric standard deviation of sediments; ρ = mass density of water; and ρ s = mass density of sediments.For a two-phase flow phenomenon involving sediment-water interaction, the terms g, ρ, and ρ s can appropriately be grouped as one independent parameter Δg in functional representation of d s ; where Δ = s − 1; s = relative density of sediments; g = acceleration due to gravity.Also, since the flow is turbulent, the effect of kinematic viscosity ν on maximum scour depth is negligible.σ g also has a negligible effect upon maximum scour depth.Using the Buckingham π theorem, the following is obtained:

Description of collected data
Chatterjee et al. 17 , Aderibigbe and Rajaratnam 18 and Dey and Sarkar 19 and Aamir and Ahmad 8 indicated that, there are significant relationships existing between scouring depth and densimetric Froude number, apron length, tailwater level and median sediment size.They all have given the models for scour depth under wall jet.However, the data sets provided by previous researchers are not having the complete information about the distance to depth of maximum scour from the rigid apron (x 0 ) , the height of dune (d d ) and the distance to maximum height of dune crest from the end of rigid apron (x d ) .As Aamir and Ahmad 20 have measured all these above parameters as given in Table 1, so their data sets are used in this study for modelling.All the four parameters are made nondimensional by dividing them with sluice opening (a) .The study utilised a sample of 165 data points from the literature on turbulent wall jet phenomenon to establish the relationships among independent and dependent parameters.A range of different parameters of the used data is summarized in Table 1.In this table, F is issuing www.nature.com/scientificreports/jet Froude number (= V/(g × d 50 ) 0.5 ) and g = acceleration due to gravity.The ratio of length of the rigid apron to sluice opening depth, i.e.L/a = 0, which signifies the absence of a rigid apron as the jet directly strikes the erodible bed as soon as it emerges from the sluice opening.

Methodology
This section discusses the comprehensive understanding of approaches considered in this study to develop the regression and machine learning based models for predicting the four parameters related to scour hole characteristics.Specifically, the purpose of the current study is to investigate the following four major things (a) what variables are predictive in the scouring phenomenon under turbulent wall jet, (b) How strong are independent variables at predicting different dependent parameters, (c) Developing regression based linear and nonlinear models and (d) compare the result with machine learning methods such as ANN-PSO and GEP.The answer to the first question is to model the maximum equilibrium scour depth (d s ) , the distance to scour depth from the end of rigid apron (x 0 ) , the height of dune (d d ) and the distance to maximum height of dune crest from the end of rigid apron (x d ) .To answer the second question, the Gamma test has been performed.

Gamma test (GT)
Gamma Test is the first step to identify the best input combination.The base mean square error (MSE) that contributes to input data selections is measured by Gamma Test.The selected input data combination can be used as part of a non-linear model's structure 21 .In this research, 15 different combinations of four input parameters (densimetric Froude number, apron length, tailwater level, and median sediment size) of each dependent parameter (i.e., d s a , x 0 a , x d a and d d a ) have been built in the winGamma software for wall jet scouring.Out of that, 7 different combinations including the best input combinations are presented for each dependent parameter in Tables 2, 3, 4 and 5. From Tables 2, 3, 4 and 5, it is observed that the combination of the four parameters with mask [1111] can establish better models for all four scouring hole parameters as compared to other combinations due to the lesser values of Gamma and V-ratio which are very close to zero.
Another term, V-ratio is used to restore a scaled invariant clamour evaluated in the vicinity of 0 to 1 and can be used to arrange the GT performance and described as where σ 2 (y) = variance of yield 'y' , which provides a standardized measure of the Gamma statistic and allows a judgment to be formed independent of the yield range on the issue of how effectively the yield can be depicted by a smooth function 22 .The V-ratio is a measure for assessing the predictability of the given yields based on readily available data.It should be noted that the input combination with a low mean square error (MSE) and V-ratio value is regarded as the most suitable input combination for scour depth modeling.Before performing the analysis through all the soft computing techniques, the scour depths under wall jet are normalised (both the input and output data) to the domain [0.05, 0.95] using Eq. ( 6) 23 .
where a norm = normalized input, a = original input, a min = minimum of the input range, a max = maximum of the input range.
To answer the third and fourth objectives, each of these four parameters is modelled through both linear and non-linear Multiple Regression analysis and compared with two AI based models i.e., ANN-PSO and GEP, in terms of statistical error analysis.

Multiple regression analysis
Considering the significance of the criticality produced due to scour under wall jets, the present study aims to address this issue by developing two regression based models with available data as input parameters.Regression analysis is generally used to find the internal dependencies between a dependent parameter and one or more independent parameters 7,24 .The multiple regression model can be better than the uni-factorial or single regression model due to the consideration of more influencing parameters.The multiple regression analysis is an improvement upon the single regression analysis by analyzing several variables and deriving the relationships between a dependent variable and several independent variables.In this study, two Multi variable Regression models are considered such as multiple linear regression analysis and multiple non-linear regression analysis.Through, multiple linear regression (MLR), the resulted output function is a linear mathematical statement, represented as follows: where Y 1 is the response variable, a 0 -a n are the constants of the equation, and X 1 -X n are the various independ- ent variables.Multiple nonlinear regression (MNLR) is an illustration of regression analysis in which nonlinear combinations of both the input and output parameters are analysed.MLR constitutes the linear models and MNLR can establish models of nonlinear relationships between influencing and response variables 7 .The MNLR is represented as follows: (1)

Artificial neural network-particle swarm optimization (ANN-PSO)
The soft computing tools in predicting the scour depth have caught much attention due to their simplicity and accuracy in computation.In contrast with various hydraulic structures such as piers, abutments, and spur dikes, it is found that there is a limited work conducted specifically for scour hole characteristics for wall jets problem.
Here, both ANN-PSO and GEP are applied to the computation of all the four parameters such as depth of scouring, location of scour depth, depth of the dune and location of dune crest.The process of setting parameters for ANN-PSO (Artificial Neural Network-Particle Swarm Optimization) and GEP (Gene Expression Programming) involves careful consideration to ensure optimal performance and reliability.ANN-PSO, also known as Artificial Neural Network-Particle Swarm Optimization, is a hybrid computational technique that combines the capabilities of Artificial Neural Networks (ANN) and Particle Swarm Optimization (PSO) algorithms.This approach is extensively employed in various fields, such as in optimization, pattern recognition, data mining, and machine learning.The hybrid ANN-PSO approach aims to hold the strengths of both ANN and PSO to intensify the optimization process and enhance the performance of neural networks.The PSO algorithm assists to find superior sets of weights for the neural network, resulting greater accuracy and faster convergence during training.This is done by searching the efficient parameter space, exploring and analysing different weight combinations, and adjusting the ANN's parameters to find an optimal solution for the specified problem.
The hybrid ANN-PSO model is widely used in many different fields since it can attain higher accuracy in less time.The ANN-PSO model approach starts with the initialization of a set of random particles.The population of particles is known as a swarm.This step specifies the positions of particles that reflect the ANN connection factors, such as biases and weights.Particle selection is normally done at random.Starting with a random population of solutions, the system iteratively improves these solutions in an attempt to find the best solution within a given search space.The hybrid PSO model network is then trained using the particles' initial positions (along with their initial biases and weights).The fitness of the trained model can be calculated using the difference between the actual and observed output.With each iteration, the solution guides the swarm toward the optimal goal by employing each particle's ability to rely on the expertise of others.Each subsequent iteration characterizes two values, local best (p id ) and global best (p gd ) 25,26 .The 'global best' is the best position among all previously obtained individual best positions, whereas the 'local best' is the best position attained by a particle so far.Weights are introduced (Mohandes 27 ), allowing particles to achieve balance throughout global and local exploration.
where R 1 and R 2 are random values ranging from zero to unity, C 1 and C 2 are acceleration constants that typically range from 1 and 3, p id and p gd represent the individual and global best values, and W 1 is the inertia weight.The acceleration coefficients C 1 and C 2 , which represent the cognitive and social learning factors, respectively, have an impact on both the local and global optimal solutions during the optimization process.C 1 and C 2 represent the weights assigned to the top historical position and the highest global position, respectively.For ANN-PSO, key parameters such as inertia weight, cognitive and social coefficients, and swarm size significantly impact the convergence and stability of the optimization process.Literature such as Shi and Eberhart 28,29 , Clerc and Kennedy 30 , and Chatterjee and Siarry 31 provide foundational insights into these parameter settings, emphasizing empirical adjustments and nonlinear variations for dynamic adaptation.In contrast, the global best signifies the best position attained among all individual best positions up to that point.To strike a balance between the global and local exploration capabilities of the particles, an inertia weight (w) is employed.Additionally, C 1 and C 2 represent the individual and social learning rates, typically varying between 1 and 3 with an interval of 0.25 32,33 .Genetic algorithms (GA) and particle swarm optimization (PSO) use evolutionary systems to find solutions.To optimize the best model, the input parameters may vary, such as the number of neurons (N) between 5 and 10 and the swarm population size between 10 and 200 with an iteration value of 1000 34 .The flow chart for the methodology of ANN-PSO model is depicted in Fig. 2.

Gene expression programming (GEP)
GEP, is an extension of genetic programming which evolves computer programs such as different mathematical expressions, decision trees, polynomial constructs, and logical expressions.Computer programs generated by GEP are encoded in linear chromosomes and are further interpreted into expression trees (ETs) (Ferreira  2001).GEP is a full-fledged genotype/phenotype system, where the genotype is completely separated from the phenotype.The approach proposes to employ the GEP design to form nonlinear functions to analyze nonlinear parameters 35,36 .The GEP structure can be used as a guide to arrange the values of the mutation, inversion, one-, two-, and gene recombination rates in order to accommodate different mathematical operators and appropriately produce the desired outcome 22,37,38 .For GEP and genetic algorithms, the selection of genetic operators like mutation rates, crossover rates, and population size is crucial.Foundational works by Goldberg 39 , Srinivas and Patnaik 40 , and Ferreira 41 presented the effects of these parameters on genetic diversity and convergence.Adaptive strategies for parameter tuning, as discussed by Eiben and Smit 42 , further enhance algorithm performance by adjusting parameters in response to the evolving population dynamics.Several of these genetic operators used for chromosome genetic alteration were explained in the GEP scheme 43 .The modelling includes five major steps where M is the range of selection, C (i,j) is the value returned by the individual chromosome i for fitness case j (out of C t fitness cases), and T j is the target value for fitness case j.The advantage of this type of fitness function is that it allows the system to find the optimal solution autonomously.Second, the set of terminals T and the set of functions F are chosen to create the chromosomes.In this study, for four equations developed to predict d s /a, x 0 /a, x d /a and d d /a, the terminals include four independent parameters (L/a, d t /a, D 50 /a, F d ).These parameters are derived from the optimal input combination obtained from Gamma test (Tables 2, 3, 4, 5).Chromosomes represent complete solutions.A common length for chromosomes is 30-50 genes, depending on the complexity of the problem.In this study, length for chromosomes is considered 28 and 30.To determine the appropriate function set, it is essential to review previous investigations in this area.Consequently, four basic operators (+, −, *, /) and fundamental mathematical functions (power, exponential, ln) were applied for modelling.
The third major step is to choose the chromosomal architecture, specifically the length of the head and the number of genes.Initially, a single gene and two head lengths were used, with the number of genes and head (7)   lengths incrementally increased one at a time during each run while monitoring the training and testing performances of each case.It was observed that using more than two genes and a head length greater than 8 did not significantly improve the training and testing performance of the GEP models.Therefore, a head length of 8 and three genes per chromosome are employed for each GEP model in this study.The fourth step is to choose the linking function.In this study, addition operator are used as linking functions.The fifth and final step is to select the set of genetic operators that induce variation and their rates.A combination of all genetic operators (mutation, inversion, and crossover) is used for this purpose.Mutation rate, the probability of altering a gene, is typically set between 0.01 and 0.1.This rate introduces diversity while maintaining stability.Inversion rate, the probability of reversing a segment within a chromosome, is often set between 0.01 and 0.1.This helps preserve genetic material while creating new sequences.Crossover frequency, the likelihood of parent chromosomes exchanging segments, is generally high, around 0.3 to 0.9.This ensures significant genetic mixing.In the present study, mutation rate, inversion rate and crossover frequency are taken as 0.044, 0.1 and 0.5 respectively.

Yes
The present work utilises the GEP to estimate the four scour hole parameters by adopting an innovative architecture of GEP structure.The flow chart of the methodology of GEP model is depicted in Fig. 3.

Uncertainty and reliability analysis
The importance of uncertainty and reliability analyses in evaluating model performance is crucial for ensuring the credibility and utility of predictive models [44][45][46][47][48] .Uncertainty analysis aims to define a reliable uncertainty interval, denoted as U95, which indicates the range within which the true outcome of an experiment is likely to fall.U95 is estimated based on the errors in the experimental measurement process, with the understanding that in approximately 95 out of 100 trials, the true outcome will lie within this interval.The U95 formula involves the www.nature.com/scientificreports/weighted summation of squared differences between observed and predicted values.Reliability analysis evaluates a model's overall consistency, expressed as a percentage calculated through the relative average error (RAE).The reliability factor is set to 1 if the RAE is less than or equal to a threshold (typically 20%), and the model's reliability is determined as the average of these factors.Collectively, these analyses provide a comprehensive understanding of model behavior, enabling decision-makers to make more informed choices and enhancing the overall robustness and trustworthiness of predictive models across various domains.

Results and discussion
Considering all the necessary data for the analysis, both multiple linear regression analysis (MLRA) and nonlinear multiple regression analysis (MLRA) are performed.The nondimensional dependent variables considered in this study are maximum equilibrium scour depth (d s /a) , the distance to scour depth from the end of rigid apron (x 0 /a) , the height of dune (d d /a) and the distance to maximum height of dune crest from the end of rigid apron (x d /a) .From Gamma test, densimetric Froude number (F d ), apron length (L), tail water level (d t ), median sediment size (D 50 ) are found to be influencing these four dependent parameters.For modelling, apron length (L), tail water level (d t ) and median sediment size (D 50 ) are made dimensionless numbers by dividing them with height of gate opening (a) such as L/a, d t /a, D 50 /a respectively.The results of variation of dependent parameters with input parameters are analyzed as demonstrated in Fig. 4a-d.Present research reports a rising trend between all dependent parameters against the apron length (L/a), as shown in Fig. 4a.The reason for this trend is attributable to the dissipation of energy of the jet as it travels over the apron.Hence, longer aprons are able to dissipate larger energy and reduce the erosive capacity of the jet.Similarly, rising trends are also visible for the variations of dependent parameters with tail water level (D t /a), densimetric Froude number (F d ), and median sediment size (D 50 ), as shown in Fig. 4.
A number of single regressions models for maximum equilibrium scour depth (d s /a) , the distance to scour depth from the end of rigid apron (x 0 /a) , the height of dune (d d /a) and the distance to maximum height of dune crest from the end of rigid apron (x d /a) with different input parameters are established.After rigorous study, the best models for each couple (dependent vs independent) with high coefficient of determination R 2 are identified.Then, multiple regression analysis has been performed and two equations (one for linear and another for nonlinear cases) are obtained for each dependent parameter, as provided below..66 for x 0 a , 0.67 for x d a , 0.56 for d d a and 0.66 for d s a , 0.74 for x 0 a , 0.74 for x d a , 0.54 for d d a respectively which measures the percentage of how much of the total variance is explained by the independent variables.Further, an attempt has been made to apply two machine learning approaches such as ANN-PSO and GEP to model these four parameters d s /a , x 0 /a , d d /a and x d /a.

Model development using ANN-PSO
In this ANN-PSO modelling, several trials were performed and the coefficients C 1 and C 2 were fixed at 1 and 2.5, 2 and 2.5, 1.5 and 2.5, 1.5 and 2.5 for d s /a , x 0 /a , d d /a and x d /a respectively.The error analysis results for the training data, testing data, and the entire dataset for various swarm sizes and number of neurons (N) for each dependent parameter were analysed.It was observed that the swarm size increases with the same values of C 1 and C 2 .While maintaining the number of neurons constant, the values of R 2 , E, and I d decrease, while the value of RMSE increases.

Model development using GEP
In this section, model development for four dependent parameters using the GEP approach is described.By incorporating all the four independent input parameters (L/a, d t /a, D 50 /a, F d ), GEP expression has been derived and GeneXpro Tools 5.0 software package is used for this analysis.Using normalized data, four attempts have been made with the variation of chromosome number, fitness function, and number of runs for modelling the wall jet scouring.Table 6 shows the corresponding parameters of the optimized GEP model.
The expression trees for models of d s /a , x 0 /a , d d /a and x d /a are presented in Fig. 5a-d, respectively.In this expression tree, d 0 = L/a, d 1 = d t /a, d 2 = D 50 /a and d 3 = F d .In Sub-ET 1, 2 and 3 (Fig. 5a), C 7 and C 9 are constants, and their values are 3.45 and − 5.56 respectively for model of d s /a .In Sub-ET 1 (Fig. 5b), C 2 is constant, the value is − 8.93 for model of x 0 /a .In Sub-ET 2 and 3 (Fig. 5c), C 4 and C 7 are the constants, and their values are 2.971 and − 0.376 respectively for model of x d /a .In Sub-ET 1 and 3 (Fig. 5d), C 4 is constant, the value is − 3.114 and 3.145 respectively for model of d d /a .The equations derived from the expression trees are presented in Eqs. ( 16)- (19) Figure 6a-d shows the relationship between observed and predicted values for the model of d s /a , x 0 /a , x d /a and d d /a respectively.It is observed that the predicted model of ANN-PSO gives good agreement with observed values for all the four models, whereas GEP shows the unsatisfactory result of the present study.

Performance of uncertainty and reliability analysis
To perform a comprehensive statistical assessment of the proposed models, two indices namely confidence interval (U95) and reliability index are computed.The statistical evaluation of the present models, highlighting their predictive capabilities and robustness using uncertainty analysis and reliability index, is presented in Table 7.
Table 7 shows the confidence interval (U95) and reliability index (RI) of MLRA, MNLRA, ANN-PSO and GEP in predicting d s /a , x 0 /a , x d /a and d d /a .ANN-PSO model represented the lowest values of confidence interval (U95), i.e., 0.383, 2.539, 2.805 and 0.268 when compared to MLRA (0.402, 2.604, 3.101and 0.293), MNLRA (0.415, 2.598, 3.063 and 0.301) and GEP (0.483, 28.800, 19.276 and 0.409) for predicting d s /a , x 0 /a , x d /a and d d /a respectively.Additionally, predictions of d s /a , x 0 /a , x d /a and d d /a provided by ANN-PSO are more reliable (RI = 0.573, 0.591, 0.576 and 0.548) when compared to other present models.Moreover, MNLRA shows slightly less reliable (RI = 0.521, 0.545, 0.570 and 0.497) than ANN-PSO in predicting d s /a , x 0 /a , x d /a and d d /a .GEP shows wider confidence intervals (U95) and lower relative index, indicating higher uncertainty and ( 16) less reliable model in predicting d s /a , x 0 /a , x d /a and d d /a .This analysis suggests that the ANN-PSO provides a more consistent and reliable model for the prediction of d s /a , x 0 /a , x d /a and d d /a.

Statistical error analysis
This section illustrates the performance of the two soft-computing models and two multiple regression models in predicting d s /a , x 0 /a , x d /a and d d /a .To assess the strength of present approaches, seven statistical indi- ces are accounted including two statistical indices such as Root mean square error (RMSE) and coefficient of  8.
From Table 8, it is found that for both (d s /a) and (x 0 /a) , the error indices, i.e., MAE, MAPE, MSE and RMSE are less for MLRA and MNLRA as compared to the ANN-PSO and GEP.But, the error indices, i.e., MAE, MAPE, MSE and RMSE are found to be less for ANN-PSO as compared to MLRA, MNLRA and GEP for both x d /a and d d /a .However, the R 2 value is more in ANN-PSO model for all predicting parameter values.E and Id values are also found to be close to 1 for ANN-PSO models for all three predicting parameter values except (x 0 /a) .For (x 0 /a) , E and Id values are found to be close to 1 for MLRA model.By comparing all the statistical parameters, ANN-PSO model shows better result as compared to the other presented regression and soft computing techniques (Table 8).

Conclusions
The present study has focused on modeling of four geometrical variables which represent the complete scour hole formation based on experimental data sets.Total 167 data points have been utilised for the modelling of following parameters, such as the maximum equilibrium scour depth (d s /a) , the distance to maximum scour depth from the end of rigid apron (x 0 /a) , the height of dune (d/a) and the distance to maximum height of dune

Figure 1 .
Figure 1.Definition sketch of developed scour hole under wall jet.

Figure 2 .
Figure 2. Flow chart for the methodology of the ANN-PSO algorithm.

Figure 5 .
Figure 5. (a) Expression tree for Wall jet scouring of d s /a . (b) Expression tree for Wall jet scouring of x 0 /a . (c) Expression tree for Wall jet scouring of x d /a . (d) Expression tree for Wall jet scouring of d d /a.

Figure 6 .
Figure 6.(a) Observed vs predicted value for all the model of d s /a . (b) Observed vs predicted value for all the model of x 0 /a . (c) Observed vs predicted value for all the model of x d /a . (d) Observed vs predicted value for all the model of d d /a.

Table 1 .
Range of different parameters of the used data.

Table 2 .
Selection of best input combination using Gamma test for d s a .

Table 3 .
Selection of best input combination using Gamma test for x 0 a .

Table 4 .
Selection of best input combination using Gamma test for x d a .

Table 5 .
Selection of best input combination using Gamma test for d d a .

Table 6 .
Parameters of the optimized GEP model for Wall Jet scouring.

Table 7 .
Comparison of performance results for the uncertainty and reliability analysis.The best results are shown in bold.

Table 8 .
Error analysis of different approaches in estimating d s /a , x 0 /a , d d /a and x d /a for wall jet scouring.